A Bayesian Approach to Graphy Regression with Relevant Subgraph Selection
نویسندگان
چکیده
Many real-world applications with graph data require the solution of a given regression task as well as the identification of the subgraphs which are relevant for the task. In these cases graphs are commonly represented as high dimensional binary vectors of indicators of subgraphs. However, since the dimensionality of such indicator vectors can be high even for small datasets, traditional regression algorithms become intractable and past approaches used to preselect a feasible subset of subgraphs. A different approach was recently proposed by a Lasso-type method where the objective function optimization with a large number of variables is reformulated as a dual mathematical programming problem with a small number of variables but a large number of constraints. The dual problem is then solved by column generation, where the subgraphs corresponding to the most violated constraints are found by weighted subgraph mining. This paper proposes an extension of this method to a Bayesian approach in which the regression parameters are considered as random variables and integrated out from the model likelihood, thus providing a posterior distribution on the target variable as opposed to a point estimate. We focus on a linear regression model with a Gaussian prior distribution on the parameters. We evaluate our approach on several molecular graph datasets and analyze whether the uncertainty in the target estimate given by the target posterior distribution variance can be used to improve model performance and therefore provides useful additional information.
منابع مشابه
A Bayesian Approach to Graph Regression with Relevant Subgraph Selection
Many real-world applications with graph data require the solution of a given regression task as well as the identification of the subgraphs which are relevant for the task. In these cases graphs are commonly represented as high dimensional binary vectors of indicators of subgraphs. However, since the dimensionality of such indicator vectors can be high even for small datasets, traditional regre...
متن کاملA Bayesian Nominal Regression Model with Random Effects for Analysing Tehran Labor Force Survey Data
Large survey data are often accompanied by sampling weights that reflect the inequality probabilities for selecting samples in complex sampling. Sampling weights act as an expansion factor that, by scaling the subjects, turns the sample into a representative of the community. The quasi-maximum likelihood method is one of the approaches for considering sampling weights in the frequentist framewo...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملBayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models
Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...
متن کاملA BAYESIAN APPROACH TO COMPUTING MISSING REGRESSOR VALUES
In this article, Lindley's measure of average information is used to measure the information contained in incomplete observations on the vector of unknown regression coefficients [9]. This measure of information may be used to compute the missing regressor values.
متن کامل